Smart-pooling et interactomes

نویسنده

  • Nicolas Thierry-Mieg
چکیده

Background: In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplication of the individual experiments, thereby correcting for experimental noise. The main difficulty consists in designing the pools in a manner that is both efficient and robust: few pools should be necessary to correct the errors and identify the positives, yet the experiment should not be too vulnerable to biological shakiness. For example, some information should still be obtained even if there are slightly more positives or errors than expected. This is known as the group testing problem, or pooling problem. Results: In this paper, we present a new non-adaptive combinatorial pooling design: the "shifted transversal design" (STD). It relies on arithmetics, and rests on two intuitive ideas: minimizing the co-occurrence of objects, and constructing pools of constant-sized intersections. We prove that it allows unambiguous decoding of noisy experimental observations. This design is highly flexible, and can be tailored to function robustly in a wide range of experimental settings (i.e., numbers of objects, fractions of positives, and expected error-rates). Furthermore, we show that our design compares favorably, in terms of efficiency, to the previously described non-adaptive combinatorial pooling designs. Conclusion: This method is currently being validated by field-testing in the context of yeast-twohybrid interactome mapping, in collaboration with Marc Vidal's lab at the Dana Farber Cancer Institute. Many similar projects could benefit from using the Shifted Transversal Design. Background With the availability of complete genome sequences, biology has entered a new era. Relying on the sequencing data of genomes, transcriptomes or proteomes, scientists have been developing high-throughput screening assays and undertaking a variety of large scale functional genomics projects. While some projects involve quantitative measurements, others consist in applying a basic yes-or-no test to a large collection of samples or "objects", – be they individuals, clones, cells, drugs, nucleic acid fragments, proteins, peptides... A large class of these binary tests aims at identifying relatively rare events. The main goal is of course to obtain information as efficiently and as reliably as possible. Typically, this is achieved by minimizing the cost of the basic assay in terms of time and money, and automating and parallelizing the experiments as much as Published: 19 January 2006 BMC Bioinformatics2006, 7:28 doi:10.1186/1471-2105-7-28 Received: 17 June 2005 Accepted: 19 January 2006 This article is available from: http://www.biomedcentral.com/1471-2105/7/28 © 2006Thierry-Mieg; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. BMC Bioinformatics 2006, 7:28 http://www.biomedcentral.com/1471-2105/7/28 Page 2 of 13 (page number not for citation purposes) possible. A major difficulty stems from the fact that highthroughput biological assays are usually somewhat noisy: reproducibility is a known problem of microarray analyses, and both false positive and false negative observations are to be expected in binary type experiments. These experimental artifacts should be identified and properly treated. A clean way to deal with the issue consists in repeating all tests several times, but this is usually prohibitively expensive and time-consuming. A more practical approach, in the case of binary tests, consists in retesting all positive results obtained in a first round. This strategy identifies most of the false positives at a reduced cost, but is powerless with regard to false negatives, leaving us in need of a better solution. In the case of binary experiments testing for rare events, an intuitively appealing strategy consists in pooling the samples to minimize the number of tests. It requires three conditions. First, the objects under scrutiny must be available individually, in a tagged form. For example, a cDNA library in bulk is not exploitable, but a collection of cDNA clones or of cloned coding regions, such as the one produced by the C. elegans ORFeome project [1], is fine. Second, it must be possible to test a pool of objects in a single assay and obtain a positive readout if at least one of the objects is positive. For example, this is the case when searching for a specific DNA sequence by PCR in a mixture of molecules: a product will be amplified if at least one of the pooled molecules contains the target sequence. Third, pooling is especially desirable and efficient when the fraction of expected positives is small (at most a few percent). Under these conditions, pooling strategies can be applied, and the difficulty then consists in choosing a "good" set of pools. This being an intuitive but rather vague goal, it must be formalized. A simple formulation, known as the group testing problem (or pooling problem), is the following. Consider a set of n events which can be true or false, represented by n Boolean variables. Let us call "pool" a subset of variables. We define the value of a pool as the disjunction (i.e., the logical OR operator) of the variables that it contains. Let us assume that at most t variables are true. The goal is to build a set of v pools, where v is small compared to n, such that by testing the values of the v pools, one can unambiguously determine the values of the n variables. If the pools must be specified in a single step, rather than incrementally by building on the results of previous tests, the problem is called "non-adaptive". Although adaptive designs can require fewer tests, non-adaptive pooling designs are often better suited to high-throughput screening projects because they allow parallelization and facilitate automation of the experiments, and also because the same pools can be used for all targets, thereby reducing the total project cost. The ability to deal with noisy observations is an important added benefit to using a pooling system, compared to the classical individual testing strategy. Indeed, noise detection and correction capabilities are inherent in any pooling system, because each variable is present in several pools, hence tested many times. Depending on the expected noise level, the redundancy can be chosen at will, and simply testing a few more pools than would be necessary in the absence of noise results in robust errorcorrection. It should be noted that minimization of the number of pools and noise correction are two conflicting goals: increasing noise tolerance generally requires testing more pools. Designing a set of pools requires balancing these two objectives, and finding the right compromise to suit the application. Other application-dependent constraints may be imposed. In particular, the pool sizes are often limited by the experimental setting. For example, in the context of the C. elegans protein interaction mapping project led by Marc Vidal [2,3], it is estimated that, using their highthroughput two-hybrid protocol, reliable readouts can be obtained with pools containing 400 AD-Y clones, or perhaps up to 1000 by tweaking the assay (Marc Vidal, personal communication). Many groups have used with some success variants of the simple "grid" design, which consists in arraying the objects on a grid and pooling the rows and columns [e.g. [4-6]]. However, although it is better than no pooling, this rudimentary design is vulnerable to noise and behaves poorly when several objects are positive, in addition to being far from optimal in terms of numbers of tests. In answer to its shortcomings, more sophisticated errorcorrecting pooling designs have been proposed. Some of these designs are very efficient in terms of numbers of tests, but lack the robustness and flexibility that most real biological applications require. Others are more adaptable and noise-tolerant at the expense of performance. In this paper, we present a new pooling algorithm: the "shifted transversal design" (STD). This design is highly flexible: it can be tailored to allow the identification of any number of positive objects and to deal with important noise levels. Yet it is extremely efficient in terms of number of tests, and we show that it compares favorably to the previously described pooling designs. The paper is organized as follows. After providing a formal definition STD, we show that it constitutes an error-correcting solution to the pooling problem. The theoretical performance of STD is then evaluated and compared with the main previously described deterministic pooling designs. Finally, we summarize our results and discuss future directions. BMC Bioinformatics 2006, 7:28 http://www.biomedcentral.com/1471-2105/7/28 Page 3 of 13 (page number not for citation purposes) Results (1): the Shifted Transversal Design Preliminaries The following notations are used throughout this paper, in accordance with the notations from [7]. Let n ≥ 2, and consider the set = {A0,...,An-1} of n Boolean variables. We will call "pool" a subset of . We say that a pool is "true", or "positive", if at least one of its elements is true. Let us call "layer" a partition of . Let q be a prime number, with q < n. We define the "compression power" of q relative to n, noted Γ(q,n), as the smallest integer γ such that qγ+1 ≥ n. We will simply write Γ for Γ(q,n) whenever possible. Let σq be the mapping of {0,1}q onto itself defined by: Note that σq is a cyclic function of order q: σq is the identity function on {0,1}q. The matrix representation Any set of pools can be represented by a Boolean matrix, as follows. Each column corresponds to one variable, and each row to one pool. The cell (i,j) is true (value 1) if pool i contains variable j, and false (value 0) otherwise.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shifted Transversal Design smart-pooling for high coverage interactome mapping.

"Smart-pooling," in which test reagents are multiplexed in a highly redundant manner, is a promising strategy for achieving high efficiency, sensitivity, and specificity in systems-level projects. However, previous applications relied on low redundancy designs that do not leverage the full potential of smart-pooling, and more powerful theoretical constructions, such as the Shifted Transversal D...

متن کامل

Supplementary materials – poolMC: Smart pooling of mRNA samples in microarray experiments

Here we describe at greater length the mathematical basis and practical implementation of the poolMC smart pooling strategy. Also included are complete versions of the figures used in the paper to analyze the performance of poolMC in a pooled microarray experiment. Pooling strategy Several pooling methods have been discussed in the literature [1, 2] (Note: The reference numbers pertain to those...

متن کامل

Calciomics: integrative studies of Ca2+-binding proteins and their interactomes in biological systems.

Calcium ion (Ca(2+)), the fifth most common chemical element in the earth's crust, represents the most abundant mineral in the human body. By binding to a myriad of proteins distributed in different cellular organelles, Ca(2+) impacts nearly every aspect of cellular life. In prokaryotes, Ca(2+) plays an important role in bacterial movement, chemotaxis, survival reactions and sporulation. In euk...

متن کامل

Fractional Max-Pooling

Convolutional networks almost always incorporate some form of spatial pooling, and very often it is α × α max-pooling with α = 2. Max-pooling act on the hidden layers of the network, reducing their size by an integer multiplicative factor α. The amazing by product of discarding 75% of your data is that you build into the network a degree of invariance with respect to translations and elastic di...

متن کامل

Interpool: interpreting smart-pooling results

MOTIVATION In high-throughput projects aiming to identify rare positives using a binary assay, smart-pooling constitutes an appealing strategy liable of significantly reducing the number of tests while correcting for experimental noise. In order to perform simulations for choosing an appropriate set of pools, and later to interpret the experimental results, the pool outcomes must be 'decoded'. ...

متن کامل

The quantitative changes in the yeast Hsp70 and Hsp90 interactomes upon DNA damage

The molecular chaperones Hsp70 and Hsp90 participate in many important cellular processes, including how cells respond to DNA damage. Here we show the results of applied quantitative affinity-purification mass spectrometry (AP-MS) proteomics to understand the protein network through which Hsp70 and Hsp90 exert their effects on the DNA damage response (DDR). We characterized the interactomes of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013